Search CORE

159 research outputs found

Genomic analyses with biofilter 2.0: knowledge driven filtering, annotation, and model development

Author: Alex Frase
Carrie Moore
Daniel Wolfe
John Wallace
Marylyn D Ritchie
Neerja Katiyar
Sarah A Pendergrass
Publication venue: Springer Nature
Publication date: 30/12/2013
Field of study

BACKGROUND: The ever-growing wealth of biological information available through multiple comprehensive database repositories can be leveraged for advanced analysis of data. We have now extensively revised and updated the multi-purpose software tool Biofilter that allows researchers to annotate and/or filter data as well as generate gene-gene interaction models based on existing biological knowledge. Biofilter now has the Library of Knowledge Integration (LOKI), for accessing and integrating existing comprehensive database information, including more flexibility for how ambiguity of gene identifiers are handled. We have also updated the way importance scores for interaction models are generated. In addition, Biofilter 2.0 now works with a range of types and formats of data, including single nucleotide polymorphism (SNP) identifiers, rare variant identifiers, base pair positions, gene symbols, genetic regions, and copy number variant (CNV) location information. RESULTS: Biofilter provides a convenient single interface for accessing multiple publicly available human genetic data sources that have been compiled in the supporting database of LOKI. Information within LOKI includes genomic locations of SNPs and genes, as well as known relationships among genes and proteins such as interaction pairs, pathways and ontological categories. Via Biofilter 2.0 researchers can: • Annotate genomic location or region based data, such as results from association studies, or CNV analyses, with relevant biological knowledge for deeper interpretation • Filter genomic location or region based data on biological criteria, such as filtering a series SNPs to retain only SNPs present in specific genes within specific pathways of interest • Generate Predictive Models for gene-gene, SNP-SNP, or CNV-CNV interactions based on biological information, with priority for models to be tested based on biological relevance, thus narrowing the search space and reducing multiple hypothesis-testing. CONCLUSIONS: Biofilter is a software tool that provides a flexible way to use the ever-expanding expert biological knowledge that exists to direct filtering, annotation, and complex predictive model development for elucidating the etiology of complex phenotypic outcomes

Springer - Publisher Connector

PubMed Central

Limited Systemic Sclerosis Patients with Pulmonary Arterial Hypertension Show Biomarkers of Inflammation and Vascular Injury

Author: Farber Harrison W.
Farina Giuseppina
Hayes Everett
Lafyatis Robert
Lemaire Raphael
Pendergrass Sarah A.
Whitfield Michael L.
Publication venue: Public Library of Science
Publication date: 01/08/2010
Field of study

Pulmonary arterial hypertension (PAH) is a common complication for individuals with limited systemic sclerosis (lSSc). The identification and characterization of biomarkers for lSSc-PAH should lead to less invasive screening, a better understanding of pathogenesis, and improved treatment.Forty-nine PBMC samples were obtained from 21 lSSc subjects without PAH (lSSc-noPAH), 15 lSSc subjects with PAH (lSSc-PAH), and 10 healthy controls; three subjects provided PBMCs one year later. Genome-wide gene expression was measured for each sample. The levels of 89 cytokines were measured in serum from a subset of subjects by Multi-Analyte Profiling (MAP) immunoassays. Gene expression clearly distinguished lSSc samples from healthy controls, and separated lSSc-PAH from lSSc-NoPAH patients. Real-time quantitative PCR confirmed increased expression of 9 genes (ICAM1, IFNGR1, IL1B, IL13Ra1, JAK2, AIF1, CCR1, ALAS2, TIMP2) in lSSc-PAH patients. Increased circulating cytokine levels of inflammatory mediators such as TNF-alpha, IL1-beta, ICAM-1, and IL-6, and markers of vascular injury such as VCAM-1, VEGF, and von Willebrand Factor were found in lSSc-PAH subjects.The gene expression and cytokine profiles of lSSc-PAH patients suggest the presence of activated monocytes, and show markers of vascular injury and inflammation. These genes and factors could serve as biomarkers of PAH involvement in lSSc

Public Library of Science (PLOS)

Boston University Institutional Repository (OpenBU)

Directory of Open Access Journals

PubMed Central

Synthesis-View: visualization and interpretation of SNP association results for multi-cohort, multi-phenotype data and meta-analysis

Author: CJ Willer
Dana C Crawford
JC Barrett
Marylyn D Ritchie
Sarah A Pendergrass
Scott M Dudek
WS Bush
Publication venue: BioMed Central
Publication date: 01/12/2010
Field of study

Abstract Background Initial genome-wide association study (GWAS) discoveries are being further explored through the use of large cohorts across multiple and diverse populations involving meta-analyses within large consortia and networks. Many of the additional studies characterize less than 100 single nucleotide polymorphisms (SNPs), often include multiple and correlated phenotypic measurements, and can include data from multiple-sites, multiple-studies, as well as multiple race/ethnicities. New approaches for visualizing resultant data are necessary in order to fully interpret results and obtain a broad view of the trends between DNA variation and phenotypes, as well as provide information on specific SNP and phenotype relationships. Results The Synthesis-View software tool was designed to visually synthesize the results of the aforementioned types of studies. Presented herein are multiple examples of the ways Synthesis-View can be used to report results from association studies of DNA variation and phenotypes, including the visual integration of p-values or other metrics of significance, allele frequencies, sample sizes, effect size, and direction of effect. Conclusions To truly allow a user to visually integrate multiple pieces of information typical of a genetic association study, innovative views are needed to integrate multiple pieces of information. As a result, we have created "Synthesis-View" software for the visualization of genotype-phenotype association data in multiple cohorts. Synthesis-View is freely available for non-commercial research institutions, for full details see <url>https://chgr.mc.vanderbilt.edu/synthesisview</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Real world scenarios in rare variant association analysis: the impact of imbalance and sample size on the power in silico

Author: Basile Anna O.
Pendergrass Sarah A.
Ritchie Marylyn D.
Zhang Xinyuan
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Background The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses. Results We conducted a large-scale simulation of randomly selected low-frequency protein-coding regions using twelve different balanced samples with an equal number of cases and controls as well as twenty-one unbalanced sample scenarios. We further explored statistical performance of different minor allele frequency thresholds and a range of genetic effect sizes. Our simulation results demonstrate that using an unbalanced study design has an overall higher type I error rate for both burden and dispersion tests compared with a balanced study design. Regression has an overall higher type I error with balanced cases and controls, while SKAT has higher type I error for unbalanced case-control scenarios. We also found that both type I error and power were driven by the number of cases in addition to the case to control ratio under large control group scenarios. Based on our power simulations, we observed that a SKAT analysis with case numbers larger than 200 for unbalanced case-control models yielded over 90% power with relatively well controlled type I error. To achieve similar power in regression, over 500 cases are needed. Moreover, SKAT showed higher power to detect associations in unbalanced case-control scenarios than regression. Conclusions Our results provide important insights into rare variant association study designs by providing a landscape of type I error and statistical power for a wide range of sample sizes. These results can serve as a benchmark for making decisions about study design for rare variant analyses

Columbia University Academic Commons

Directory of Open Access Journals

Investigating the relationship between mitochondrial genetic variation and cardiovascular-related traits to develop a framework for mitochondrial phenome-wide association studies

Author: Dana C Crawford
Eric Farber-Eger
Jacob B Hall
Jonathan Boston
Robert J Goodloe
Sabrina L Mitchell
Sarah A Pendergrass
William S Bush
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

BACKGROUND: Mitochondria play a critical role in the cell and have DNA independent of the nuclear genome. There is much evidence that mitochondrial DNA (mtDNA) variation plays a role in human health and disease, however, this area of investigation has lagged behind research into the role of nuclear genetic variation on complex traits and phenotypic outcomes. Phenome-wide association studies (PheWAS) investigate the association between a wide range of traits and genetic variation. To date, this approach has not been used to investigate the relationship between mtDNA variants and phenotypic variation. Herein, we describe the development of a PheWAS framework for mtDNA variants (mt-PheWAS). Using the Metabochip custom genotyping array, nuclear and mitochondrial DNA variants were genotyped in 11,519 African Americans from the Vanderbilt University biorepository, BioVU. We employed both polygenic modeling and association testing with mitochondrial single nucleotide polymorphisms (mtSNPs) to explore the relationship between mtDNA variants and a group of eight cardiovascular-related traits obtained from de-identified electronic medical records within BioVU. RESULTS: Using polygenic modeling we found evidence for an effect of mtDNA variation on total cholesterol and type 2 diabetes (T2D). After performing comprehensive mitochondrial single SNP associations, we identified an increased number of single mtSNP associations with total cholesterol and T2D compared to the other phenotypes examined, which did not have more significantly associated SNPs than would be expected by chance. Among the mtSNPs significantly associated with T2D we identified variant mt16189, an association previously reported only in Asian and European-descent populations. CONCLUSIONS: Our replication of previous findings and identification of novel associations from this initial study suggest that our mt-PheWAS approach is robust for investigating the relationship between mitochondrial genetic variation and a range of phenotypes, providing a framework for future mt-PheWAS

Springer - Publisher Connector

PubMed Central

OPENING THE DOOR TO THE LARGE SCALE USE OF CLINICAL LAB MEASURES FOR ASSOCIATION TESTING: EXPLORING DIFFERENT METHODS FOR DEFINING PHENOTYPES

Author: Christopher R Bauer
Daniel Lavage
J Matthew Mahoney
John Snyder
Joseph Leader
Sarah A Pendergrass
Publication venue
Publication date: 02/04/2020
Field of study

The past decade has seen exponential growth in the numbers of sequenced and genotyped individuals and a corresponding increase in our ability of collect and catalogue phenotypic data for use in the clinic. We now face the challenge of integrating these diverse data in new ways new that can provide useful diagnostics and precise medical interventions for individual patients. One of the first steps in this process is to accurately map the phenotypic consequences of the genetic variation in human populations. The most common approach for this is the genome wide association study (GWAS). While this technique is relatively simple to implement for a given phenotype, the choice of how to define a phenotype is critical. It is becoming increasingly common for each individual in a GWAS cohort to have a large profile of quantitative measures. The standard approach is to test for associations with one measure at a time; however, there are many justifiable ways to define a set of phenotypes, and the genetic associations that are revealed will vary based on these definitions. Some phenotypes may only show a significant genetic association signal when considered together, such as through principle components analysis (PCA). Combining correlated measures may increase the power to detect association by reducing the noise present in individual variables and reduce the multiple hypothesis testing burden. Here we show that PCA and k-means clustering are two complimentary methods for identifying novel genotype-phenotype relationships within a set of quantitative human traits derived from the Geisinger Health System electronic health record (EHR). Using a diverse set of approaches for defining phenotype may yield more insights into the genetic architecture of complex traits and the findings presented here highlight a clear need for further investigation into other methods for defining the most relevant phenotypes in a set of variables. As the data of EHR continue to grow, addressing these issues will become increasingly important in our efforts to use genomic data effectively in medicine

CiteSeerX

Genomic analyses with biofilter 2.0: knowledge driven filtering, annotation, and model development

Author: A Chatr-Aryamontri
Alex Frase
B Maher
BJ Grady
Carrie Moore
CF Thorn
Daniel Wolfe
John Wallace
K Kandasamy
L Licata
LA Hindorff
LR Meyer
M Ashburner
Marylyn D Ritchie
Neerja Katiyar
OL Griffith
R Cowper-Sal lari
RA Haw
RD Finn
SA Pendergrass
Sarah A Pendergrass
SD Turner
WS Bush
WS Bush
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Diverse Convergent Evidence in the Genetic Analysis of Complex Disease: Coordinating Omic, Informatic, and Experimental Evidence to Better Identify and Validate Risk Factors

Author: Amos Christopher I
Bartlett Jacquelaine
Ciesielski Timothy H
Gui Jiang
Huang Minjun
Kodaman Nuri
Li Jing
Moore Jason H
Pan Qinxin
Pendergrass Sarah A
Ritchie Marylyn D
Selleck Scott B
Sobota Rafal S
White Marquitta J
Williams Scott M
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/2014
Field of study

In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/validation, promote effective science communication, and guide future research directions

Springer - Publisher Connector

PubMed Central

Dartmouth Digital Commons (Dartmouth College)